16

This is a relatively quick prediction, and the three-dimensional coordinates are then

available for the user to download. However, it requires a protein with a known three-­

dimensional structure as a template in order to calculate how much the user’s sequence

differs from this in its three-dimensional structure. Whether a template can be found is

determined by a special sequence comparison with the proteins in the SWISS-MODEL

database.

SWISS-MODEL is a very solid, fast and often confirmed approach to determine a

three-dimensional structure according to protein template. However, there are many other,

often much more complex ways of calculating the protein structure (e.g. homology model­

ling with MODELLER):

c

c

https://salilab.org/modeller/tutorial/

Since structures are not always available that can serve as a template, so-called ab initio

and optimization algorithms calculate an approximate solution for the structure determi­

nation based on the sequence and the minimization of the free enthalpy. Prominent repre­

sentatives here are neural networks, evolutionary algorithm or Monte Carlo simulation.

One example is the QUARK server from the Zhang lab:

c

c

https://zhanglab.ccmb.med.umich.edu/QUARK/

Marking of the Known Structural Parts in the Protein Sequence

For independent verification, we offer at the chair a labeling of the known three-­dimensional

structural domains to any sequence (the technical language says domain annotation, that is

why our tool is called “AnDom”). This is a slightly different procedure and works for any

sequence. It just looks to see if at least a small piece of the sequence is not similar to a

known three-dimensional protein structure. Thus, it is completely independent of the

ExPASy predictions and can check them. In general, independent databases and softwares

from different authors and methods check each other. This allows to significantly increase

the quality of the predictions, e.g. to collect all structure predictions (broad search) or to

accept only those found by both websites (particularly validated predictions).

This then sometimes makes the predictions a bit tight. This happens when only short

parts of the sequence have sufficient similarity to the structural databases that AnDom has.

It can also happen that the protein structure is new, i.e. not similar enough to any known

structure to allow prediction. Just as when using BLAST, very small random expectation

values (1 in one million and lower probabilities) mean that the assignment using AnDom

has been very successful in revealing a structure similarity. In contrast, a random similarity

can be recognized by a high random hit rate (higher than 1 in 1000). It may even happen

that such a small similarity is found several times even by a random sequence. In this case,

the expected value is e.g. 4, if on average a random sequence would find four such hits in

the AnDom structure database.

1  Sequence Analysis: Deciphering the Language of Life